Search CORE

Springer - Publisher Connector

arXiv.org e-Print Archive

A probabilistic model for gene content evolution with duplication, loss, and horizontal transfer

Author: A.B. Simonson
B. Boussau
B. Snel
B. Snel
B.E. Dutilh
B.G. Mirkin
C. Pál
C.G. Kurland
D.H. Huson
E. Belda
E.A. Herniou
E.D. Green
E.J. Deeds
E.L.L. Sonnhammer
E.V. Koonin
F. Delsuc
F. Tekaia
G.D.P. Clarke
G.P. Karev
G.P. Karev
G.P. Karev
I.K. Jordan
J. Lin
J.A. Lake
J.O. Korbel
J.P. Gogarten
J.T. Herbeck
K.H. Wolfe
M. Csűrös
M. Pellegrini
M.G. Montague
M.W. Hahn
R.L. Tatusov
S. Karlin
S. Yang
S.T. Fitz-Gibbon
T. Pupko
V. Kunin
V. Kunin
W. Feller
W.J. Reed
X. Gu
Y. Boucher
Y.I. Wolf
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/09/2005
Field of study

We introduce a Markov model for the evolution of a gene family along a phylogeny. The model includes parameters for the rates of horizontal gene transfer, gene duplication, and gene loss, in addition to branch lengths in the phylogeny. The likelihood for the changes in the size of a gene family across different organisms can be calculated in O(N+hM^2) time and O(N+M^2) space, where N is the number of organisms,

h

is the height of the phylogeny, and M is the sum of family sizes. We apply the model to the evolution of gene content in Preoteobacteria using the gene families in the COG (Clusters of Orthologous Groups) database

CiteSeerX

Phylogeny of Prokaryotes and Chloroplasts Revealed by a Simple Composition Approach on All Protein Sequences from Complete Genomes Without Sequence Alignment

Author: C Lemieux
CR Woese
CR Woese
D Sankoff
DH Moreira
E Chatton
E Mayr
E Pennisi
F Tekaia
FitchWM
GI McFadden
GI McFadden
GW Stuart
GW Stuart
J Adachi
J Las Rivas De
J Lin
J Qi
J Qi
J.Q. Deng
JA Eisen
JD Palmer
JR Brown
K.H. Chu
KH Chu
L.Q. Zhou
M Li
M Turmel
M Turmel
MA Ragan
MW Gray
MW Gray
N Saitou
O Weiss
RF Doolittle
RF Doolittle
RL Charlebois
RS Gupta
S.C. Long
ST Fitz-Gibbon
SV Edwards
V.V. Anh
VL Stirewalt
W Martin
W Martin
W Martin
Z.G. Yu
ZG Yu
ZG Yu
ZG Yu
ZG Yu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

The complete genomes of living organisms have provided much information on their phylogenetic relationships. Similarly, the complete genomes of chloroplasts have helped to resolve the evolution of this organelle in photosynthetic eukaryotes. In this paper we propose an alternative method of phylogenetic analysis using compositional statistics for all protein sequences from complete genomes. This new method is conceptually simpler than and computationally as fast as the one proposed by Qi et al. (2004b) and Chu et al. (2004). The same data sets used in Qi et al. (2004b) and Chu et al. (2004) are analyzed using the new method. Our distance-based phylogenic tree of the 109 prokaryotes and eukaryotes agrees with the biologists tree of life based on 16S rRNA comparison in a predominant majority of basic branching and most lower taxa. Our phylogenetic analysis also shows that the chloroplast genomes are separated to two major clades corresponding to chlorophytes s.l. and rhodophytes s.l. The interrelationships among the chloroplasts are largely in agreement with the current understanding on chloroplast evolution

Queensland University of Technology ePrints Archive

CandidaDB: a genome database for Candida albicans pathogenomics

Author: Albrecht Antje
Bader O.
Brown A. J. P.
Castillo L.
d'Enfert C.
de Groot P.
Dominguez A.
Ernst J. F.
Fradin C.
Frangeul L.
Gaillardin C.
Garcia-Sanchez S.
Goyard S.
Hube B.
Jones L.
Klis F. M.
Krishnamurthy S.
Kunze D.
Lopez M.-C.
Martin J. Perez
Martin N.
Mavor A.
Moszer I.
Onésime D.
Rodriguez-Arnaveilhe S.
Sentandreu R.
Tekaia F.
Valentin E.
Publication venue: Oxford University Press
Publication date: 17/12/2004
Field of study

CandidaDB is a database dedicated to the genome of the most prevalent systemic fungal pathogen of humans, Candida albicans. CandidaDB is based on an annotation of the Stanford Genome Technology Center C.albicans genome sequence data by the European Galar Fungail Consortium. CandidaDB Release 2.0 (June 2004) contains information pertaining to Assembly 19 of the genome of C.albicans strain SC5314. The current release contains 6244 annotated entries corresponding to 130 tRNA genes and 5917 protein-coding genes. For these, it provides tentative functional assignments along with numerous pre-run analyses that can assist the researcher in the evaluation of gene function for the purpose of specific or large-scale analysis. CandidaDB is based on GenoList, a generic relational data schema and a World Wide Web interface that has been adapted to the handling of eukaryotic genomes. The interface allows users to browse easily through genome data and retrieve information. CandidaDB also provides more elaborate tools, such as pattern searching, that are tightly connected to the overall browsing system. As the C.albicans genome is diploid and still incompletely assembled, CandidaDB provides tools to browse the genome by individual supercontigs and to examine information about allelic sequences obtained from complementary contigs. CandidaDB is accessible at http://genolist.pasteur.fr/CandidaDB

Aberdeen University Research Archive

International Migration, Integration and Social Cohesion online publications

Hal-Diderot

Beyond representing orthology relations by trees

Author: A Tofigh
AM Altenhoff
C Semple
Consortium T.G.O.
D Huson
D Wen
E Jacox
F Tekaia
G Jin
G. E. Scholz
J Jun
K Chen
K. T. Huber
KT Huber
L Nakhleh
LJJ Iersel van
M Hellmuth
M Hellmuth
M Lafond
M Stolzer
MS Bansal
O Mahmudi
P Gambette
P Górecki
R Tatusov
R Tatusov
S Böcker
S Willson
Y Ovadia
Y Yu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/11/2016
Field of study

Reconstructing the evolutionary past of a family of genes is an important aspect of many genomic studies. To help with this, simple relations on a set of sequences called orthology relations may be employed. In addition to being interesting from a practical point of view they are also attractive from a theoretical perspective in that e.\,g.\,a characterization is known for when such a relation is representable by a certain type of phylogenetic tree. For an orthology relation inferred from real biological data it is however generally too much to hope for that it satisfies that characterization. Rather than trying to correct the data in some way or another which has its own drawbacks, as an alternative, we propose to represent an orthology relation

\delta

in terms of a structure more general than a phylogenetic tree called a phylogenetic network. To compute such a network in the form of a level-1 representation for

\delta

, we formalize an orthology relation in terms of the novel concept of a symbolic 3- dissimilarity which is motivated by the biological concept of a ``cluster of orthologous groups'', or COG for short. For such maps which assign symbols rather that real values to elements, we introduce the novel {\sc Network-Popping} algorithm which has several attractive properties. In addition, we characterize an orthology relation

\delta

on some set

X

that has a level-1 representation in terms of eight natural properties for

\delta

as well as in terms of level-1 representations of orthology relations on certain subsets of

X

Springer - Publisher Connector

University of East Anglia digital repository

Natural History, Microbes and Sequences: Shouldn't We Look Back Again to Organisms?

Author: A Lazcano
Antonio Lazcano
AP Vogler
AT Peterson
C Becerra-Bracho
CK Yoon
CR Woese
D Young
E Mayr
E Stackebrandt
E Zuckerkandl
F Tekaia
GE Hutchinson
GHF Nuttall
J Qin
JS Wilkins
KE Nelson
L Koerner
NR Pace
PDN Hebert
R Kays
R Li
R Rossello-Mora
RK Trench
Sergios-Orestis Kolokotronis
WB Whitman
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The discussion on the existence of prokaryotic species is reviewed. The demonstration that several different mechanisms of genetic exchange and recombination exist has led some to a radical rejection of the possibility of bacterial species and, in general, the applicability of traditional classification categories to the prokaryotic domains. However, in spite of intense gene traffic, prokaryotic groups are not continuously variable but form discrete clusters of phenotypically coherent, well-defined, diagnosable groups of individual organisms. Molecularization of life sciences has led to biased approaches to the issue of the origins of biodiversity, which has resulted in the increasingly extended tendency to emphasize genes and sequences and not give proper attention to organismal biology. As argued here, molecular and organismal approaches that should be seen as complementary and not opposed views of biology

Public Library of Science (PLOS)

Red Mexicana de Repositorios Institucionales

A novel series of compositionally biased substitution matrices for comparing Plasmodium proteins

Author: A Bahl
AP Gasch
E Pizzi
E Pizzi
E Pizzi
Elisabetta Pizzi
F Chen
F Chen
F Tekaia
H Musto
H Musto
HY Xue
J Casasnovas
JC Aude
JC Wootton
JE Coronado
JG Henikoff
JM Carlton
Kevin Brick
M Dayhoff
M Petter
MJ Gardner
N Joannin
O Bastien
O Bastien
P Rice
PC Ng
Q Cheng
RD Knight
S Henikoff
S Henikoff
SF Altschul
SF Altschul
SF Altschul
T Muller
World Health Organization
WS Torgerson
YK Yu
YK Yu
Publication venue: BioMed Central
Publication date: 01/05/2008
Field of study

Abstract Background The most common substitution matrices currently used (BLOSUM and PAM) are based on protein sequences with average amino acid distributions, thus they do not represent a fully accurate substitution model for proteins characterized by a biased amino acid composition. This problem has been addressed recently by adjusting existing matrices, however, to date, no empirical approach has been taken to build matrices which offer a substitution model for comparing proteins sharing an amino acid compositional bias. Here, we present a novel procedure to construct series of symmetrical substitution matrices to align proteins from similarly biased <it>Plasmodium </it>proteomes. Results We generated substitution matrices by selecting from the BLOCKS database those multiple alignments with a compositional bias similar to that of <it>P. falciparum </it>and <it>P. yoelii </it>proteins. A novel 'fuzzy' clustering method was adopted to group sequences within these alignments, showing that this method retains more complete information on the amino acid substitutions when compared to hierarchical clustering. We assessed the performance against the BLOSUM62 series and showed that the usage of our matrices results in an improvement in the performance of BLAST database searches, greatly reducing the number of false positive hits. We then demonstrated applications of the use of novel matrices to improve the annotation of homologs between the two <it>Plasmodium </it>species and to classify members of the <it>P. falciparum </it>RIFIN/STEVOR family. Conclusion We confirmed that in the case of compositionally biased proteins, standard BLOSUM matrices are not suited for optimal alignments, and specific substitution matrices are required. In addition, we showed that the usage of these matrices leads to a reduction of false positive hits, facilitating the automatic annotation process.</p

AIR Universita degli studi di Milano

Springer - Publisher Connector

Public Library of Science (PLOS)

A Combination of Compositional Index and Genetic Algorithm for Predicting Transmembrane Helical Segments

Author: A Krogh
A Thomas
B Rost
E Falkenauer
E Wallin
EL Sonnhammer
F Tekaia
G Tusnady
G von Heijne
GE Tusnady
H Berman
H Shen
H Zhou
J Holland
J Pylouster
JM Cuthbertson
L Kall
M Cserzo
M Suyama
MG Claros
Nazar Zaki
Pierandrea Temussi
R Garey
RY Kahsay
S Hosseini
S Jayasinghe
S Roy
Salah Bouktif
Sanja Lazarova-Molnar
T Hirokawa
T Nugent
T Taylor
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Transmembrane helix (TMH) topology prediction is becoming a focal problem in bioinformatics because the structure of TM proteins is difficult to determine using experimental methods. Therefore, methods that can computationally predict the topology of helical membrane proteins are highly desirable. In this paper we introduce TMHindex, a method for detecting TMH segments using only the amino acid sequence information. Each amino acid in a protein sequence is represented by a Compositional Index, which is deduced from a combination of the difference in amino acid occurrences in TMH and non-TMH segments in training protein sequences and the amino acid composition information. Furthermore, a genetic algorithm was employed to find the optimal threshold value for the separation of TMH segments from non-TMH segments. The method successfully predicted 376 out of the 378 TMH segments in a dataset consisting of 70 test protein sequences. The sensitivity and specificity for classifying each amino acid in every protein sequence in the dataset was 0.901 and 0.865, respectively. To assess the generality of TMHindex, we also tested the approach on another standard 73-protein 3D helix dataset. TMHindex correctly predicted 91.8% of proteins based on TM segments. The level of the accuracy achieved using TMHindex in comparison to other recent approaches for predicting the topology of TM proteins is a strong argument in favor of our proposed method. Availability: The datasets, software together with supplementary materials are available at: http://faculty.uaeu.ac.ae/nzaki/TMHindex.htm

CiteSeerX